Analysis of ecological context variables and regression models for prediction the relationship (slope) between lake water level and cumulative precipitation deviation using 40 Lake data set generated on Feb. 27 2019. These analyses are driven by semi-informed decisions about what ecological context variables may be important factors influencing water level changes in lakes and their response precipitation.

The Data

Data in this analysis involves slope values from 40 lakes.

Time series plots of the 40 lakes used in these analyses
Relationship between centered water levels and centered long-term commulative precipitation values

The lakes were chosen by the following criteria:

  1. 8+ years of water level observations
  2. The range in precipitation associated with water level observations equal to the q85-q15 for the entire observed precip data file (1920-present). The q15 and q18 are represented by the vertical lines in the above figure.
  3. 8 additional lakes were discarded after this selection for known issues with the data or ecologically unfeasible relationships (e.g., water levels decline during periods of increased precipitation)

The ecological context variables used to understand the variation in the slope of the relationships among lakes between water levels and precip seen above included the following variables slope, MaxDepth, W_PERM, W_DARCY, cond, elevation_difference, Area, r_forest. Slope is the response variable. Forest land type was choose for the land-use variable because it is strongly correlated with other variables. We also choose to use land-use characteristics calculated for the riparian zone (30m buffer) around the lake.

Correlation plot of landuse variables

Biplots of slope vs potential driver variables

Model Selection

We have tested a variety of model selection approaches to explain the variation in among lake slopes and ecological context variables including black box approaches such as random forest. For easy of interpretation, we have choosen to rely on linear models. Model selection was conducted using the glmulti package to quanitify the best 0:4 parameter models including the potential for two-way interactions. We also limited the model to 4 predictor variables which would could include the interaction between two variables and an additional variable (3 parameters) or 4 parameters if no interactions included due to sample size concerns and overfitting the data.

Model Selection formulas and associated AICc values
model aicc
slope ~ 1 + W_DARCY + cond + elevation_difference + cond:W_DARCY 63.14421
slope ~ 1 + W_DARCY + elevation_difference + r_forest + r_forest:W_DARCY 65.78285
slope ~ 1 + W_DARCY + cond + cond:W_DARCY 67.41354
slope ~ 1 + W_DARCY + cond + Area + cond:W_DARCY 68.59125
slope ~ 1 + cond + elevation_difference + elevation_difference:cond 68.85866
slope ~ 1 + MaxDepth + W_DARCY + cond + cond:W_DARCY 69.55145
slope ~ 1 + W_PERM + W_DARCY + cond + cond:W_DARCY 70.16676
slope ~ 1 + MaxDepth + cond + elevation_difference + elevation_difference:cond 70.17875
slope ~ 1 + W_DARCY + Area + r_forest + r_forest:W_DARCY 71.03460
slope ~ 1 + cond + elevation_difference 71.19999
slope ~ 1 + W_DARCY + cond + elevation_difference + elevation_difference:cond 71.31015
slope ~ 1 + elevation_difference 71.98504
slope ~ 1 + MaxDepth + cond + elevation_difference 72.75802
slope ~ 1 + cond + elevation_difference + Area 72.90693
slope ~ 1 + elevation_difference + Area 73.25866
slope ~ 1 + W_DARCY + cond + elevation_difference 73.42975
slope ~ 1 + elevation_difference + r_forest 73.55986
slope ~ 1 + MaxDepth + elevation_difference 73.63163
slope ~ 1 + cond + elevation_difference + r_forest 73.81450
slope ~ 1 + W_PERM + cond + elevation_difference 73.82182

Model Performance [Conductivity]

## 
## Call:
## lm(formula = slope ~ elevation_difference + cond * W_DARCY, data = dat_input)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8645 -0.2267 -0.0636  0.2278  1.0575 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           1.112e+00  2.757e-01   4.034 0.000283 ***
## elevation_difference  1.564e-02  6.021e-03   2.597 0.013652 *  
## cond                 -1.222e-03  1.182e-03  -1.034 0.308196    
## W_DARCY               1.623e-03  8.631e-04   1.880 0.068445 .  
## cond:W_DARCY         -2.355e-05  6.403e-06  -3.677 0.000785 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4749 on 35 degrees of freedom
## Multiple R-squared:  0.5133, Adjusted R-squared:  0.4577 
## F-statistic: 9.228 on 4 and 35 DF,  p-value: 3.361e-05

Comparison of Extrapolation Space [Conductivity]

Comparison of the distrubtion of values and predicted slopes for the modeled population (observed) and extrapolation population (exptrapolation) for which the general model was used to extrapolate slopes.

While it is clear that we are missing some of the extreme values in our model population (lakes used to build the model), it does not appear to influence the predictive slopes substantually. The extrapolated slopes (n = 455) are largely bounded by the slopes observed in the modeled population.

Model Performance [Forest]

We also ran the models using forest landuse data instead of conductivity because it performed only slightly worse and may allow extrapolation in lakes without conductivity data.

## 
## Call:
## lm(formula = slope ~ elevation_difference + cond * W_DARCY, data = dat_input)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8645 -0.2267 -0.0636  0.2278  1.0575 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           1.112e+00  2.757e-01   4.034 0.000283 ***
## elevation_difference  1.564e-02  6.021e-03   2.597 0.013652 *  
## cond                 -1.222e-03  1.182e-03  -1.034 0.308196    
## W_DARCY               1.623e-03  8.631e-04   1.880 0.068445 .  
## cond:W_DARCY         -2.355e-05  6.403e-06  -3.677 0.000785 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4749 on 35 degrees of freedom
## Multiple R-squared:  0.5133, Adjusted R-squared:  0.4577 
## F-statistic: 9.228 on 4 and 35 DF,  p-value: 3.361e-05

Comparison of Extrapolation Space [Forest]

Comparison of the distrubtion of values and predicted slopes for the modeled population (observed) and extrapolation population (exptrapolation) for which the general model was used to extrapolate slopes.

Comparison of predictions generated using Conductivity and Riparian Forest Cover